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Executive  Summary 


This  report  originated  in  the  H°°  Research  Initiative  of  the  Office  of  Naval  Research 
and  the  ILIR  Program  of  SPAWAR  Systems  Center  San  Diego.  These  programs 
migrated  H°°  Engineering  into  fleet  applications,  specifically  wideband  impedance 
matching  and  wideband  amplifier  optimization.  Research  in  these  applications  pro¬ 
duced  several  papers  [24],  [23],  [3],  [4],  four  patents,  a  book  [2],  and  sparked  the 
Defense  Advanced  Research  Projects  Agency’s  interest  in  Digital  H°°  Engineering. 

As  the  applications  coalesced,  a  general  principle  underlying  these  optimization 
problems  became  apparent — that  solutions  of  these  optimization  problems  could  be 
characterized  by  the  Kolmogorov  Criterion.  This  report  makes  explicit  that  the  Kol¬ 
mogorov  Criterion  can  specialize  with  sufficient  detail  to  yield  concrete  and  compu¬ 
tationally  viable  tests  that  identify  solutions  to  difficult  optimization  problems. 

Specifically,  the  classical  “equal-ripple”  characterization  of  best  polynomial  ap¬ 
proximation  is  generalized  to  nonlinear  polynomial  optimization,  and  then  general¬ 
ized  again  to  multiobjective  polynomial  optimization.  Thus,  results  in  polynomial 
optimization  stretching  over  this  last  century  readily  fit  into  a  single  framework  and 
are  illustrated  with  applications  in  filter  design  and  control  theory.  In  addition  to  the 
finite-dimensional  polynomials,  the  Kolmogorov  Criterion  also  applies  to  the  infinite¬ 
dimensional  disk  algebra.  The  disk  algebra  is  basic  to  signal  processing  and  control 
theory.  Many  engineering  problems  in  these  disciples  are  optimization  problems  on 
the  disk  algebra.  The  Kolmogorov  Criterion  readily  characterizes  the  minimizers  of 
these  nonlinear  optimization  problems. 

By  making  explicit  the  Kolmogorov  Criterion  and  working  specific  examples,  this 
report  equips  researchers  with  a  general  approach  to  optimization  on  spaces  of  func¬ 
tions  and  a  collection  of  accessible  research  problems. 
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1  The  Mathematical  Summary 


Let  Z  be  a  compact  subset  of  the  complex  numbers  C.  Let  C(Z,  C)  denote  the 
complex-valued  functions  that  are  continuous  on  Z.  A  real-valued  function  T  :  Z  x 
C  — >  R  is  called  a  performance  function.  A  continuous  performance  function  induces 
a  continuous  objective  function  7  :  C(Z,  C)  — >  R: 

7 (h)  :=  sup{r(2:,  h(z))  :  z  E  Z}. 

Let  7i  denote  a  subset  of  C(Z,  C).  Minimization  of  this  objective  function  7  on  7i  is 
the  general  optimization  problem: 

inf  (7(/i)  :  hEl-t  CC(Z,€)}. 

Important  for  both  theory  and  computation  is  recognizing  solutions  to  this  minimiza¬ 
tion  problem.  Specifically,  if  you  were  handed  a  minimizer 

K nin  :=  argmin{7(/i)  :  h  E  H }, 


could  you  recognize  that  h™ in  was  a  minimum  of  7  ?  Recognizing  such  minimizers 
is  the  characterization  problem.  The  multiobjective  characterization  problem  has 
r  :  Z  x  C  — >  RjU  and  minimizes  the  corresponding  vector-valued  function: 


7 (h)  ■= 


7i  (h) 

12(h) 


lM(h)  _ 


7 m(h)  ■■=  sup{rTO(z,  h(z))  :  z  E  Z}. 


We  consider  the  characterization  problem  for  the  following  subspaces: 


•  Polynomials  VN 

•  Disk  algebra  M(D) 


The  Kolmogorov  Criterion  provides  an  easy  route  to  the  necessary  conditions  that 
characterize  a  minimizer  while  the  interpolating  properties  of  the  subspaces  complete 
the  sufficiency  arguments. 

For  optimization  on  the  polynomials,  a  new  characterization  of  polynomial  mini¬ 
mizers  is  obtained.  This  characterization  is  a  substantial  extension  of  the  well-known 
“equal-ripple”  theorem  of  polynomial  approximation  [11],  Applications  to  nonlinear 
approximation  and  spectral  factorization  illustrate  this  result. 

For  optimization  on  the  infinite-dimensional  disk  algebra,  we  recapture  Helton  and 
Merino’s  [17]  flatness  and  winding  number  characterization  of  minimizers.  The  point 
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of  this  recapitulation  is  to  show  flexibility  of  the  Kolmogorov  approach.  Applications 
to  impedance  matching  and  control  theory  illustrate  the  characterization  and  bring 
us  to  minimizing  multiple  objective  functions. 

For  multiobjective  optimization,  a  new  identification  of  polynomial  minimizers  is 
obtained  by  spreading  the  equal-ripple  result  over  the  multiple  objective  functions. 
However,  a  “phase-splitting”  phenomenon  confounds  the  sufficiency  argument.  Nev¬ 
ertheless,  this  new  theory  is  sufficient  to  explore  the  set  of  all  possible  minima  and 
uncover  a  surprisingly  fine  structure. 

In  summary,  Kolmogorov  Criterion  is  a  computational  framework  for  exploring 
optimization  theory  in  general  with  sufficient  detail  to  deliver  specific  results  on  op¬ 
timization  on  function  spaces. 


2  Notation  and  Preliminaries 

The  real  numbers  are  denoted  by  R.  Real  TV-dimensional  space  is  denoted  by  Rv . 
The  closed  positive  cone  of  RA  is  denoted  by  R  A.  The  complex  numbers  are  denoted 
by  C.  Complex  TV-dimensional  space  is  denoted  by  C^.  Throughout  this  report,  Z 
denotes  a  compact  subset  of  C.  The  open  unit  disk 

D  :=  {z  E  C  :  1 2 1  <  1} 

has  the  unit  circle  T  as  boundary 

T  :=  {z  E  C  :  |*|  =  1}. 

If  A  is  a  Banach  space  with  norm  ||  o  \\e,  C(Z,E)  denotes  the  set  of  continuous 
functions  h  :  Z  — >  E  with  norm 

IWloo  :=  sup{||h(^)||E  :  z  E  Z}. 


In  more  detail, 

•  C(Z,  R)  denotes  the  set  of  continuous  real- valued  functions  on  Z 

•  C(Z,  C)  denotes  the  set  of  continuous  complex- valued  functions  on  Z 

•  C(Z,CM)  denotes  the  continuous  CM- valued  functions  on  Z 

If  T  E  C(Z  x  C,  R)  is  continuous,  it  lifts  to  the  mapping  f  :  C(Z,  C)  — >  C(Z,  R) 

f  (h;z)=r(z,h(z)) 
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that  induces  the  objective  function  7  :  C(Z,  C)  — >  R 

7 (h)  :=  sup{T(z,  h(z))  :  z  G  Z}. 

The  associated  critical  set  of  h  G  C(Z,  C)  is  denoted 

crit[7(h)]  :=  {z  E  Z  :  7 (h)  =  T(z,  h(z))}  . 

If  hi  is  a  subspace  of  C(Z,  C),  a  nonzero  Ah  G  hi  is  called  a  direction  of  nonincrease 
[9]  or  direction  of  descent  for  7  provided  for  all  t  >  0  sufficiently  small 

7 (h  +  tAh)  <  7 (h). 

The  function  h  G  hi  is  called  a  local  minimum,  for  7  provided  for  all  Ah  G  hi  sufficiently 
small  there  holds 

7 (h  +  Ah)  >  7(h). 

A  Taylor’s  expansion  in  C(Z,  C)  is  needed.  Following  Helton’s  notation,  recall  the 
derivative  on  C  has  the  form  [22] 

d=d_=UjL-i<L\ 

dz  2  \  dx  1  dyj 

If  T  :  C  — >  R  is  C2,  Taylor’s  expansion  is 

f)T  f)r 

T(h  +  Ah)  =  F(h)  +  —(h)Au  +  —(h)Av  +  0[\Ah,\2} 

ox  ay 

=  T(h)  +  2^[dT(h)Ah]  +  0[\Ah\2], 

where  Ah  =  A u  +  iAv  G  C.  The  Omega  Lemma  lifts  this  expansion  to  the  corre¬ 
sponding  expansion  for  T  operating  on  C(Z,  C). 

Lemma  1  (Omega)  [1]  Let  E  and  F  be  Banach  spaces.  LetU  C  E  be  open.  Assume 
g  :  U  C  E  — >  F  is  a  Cr  map  (■ r  >  0)  with  first  variation  Dg  :  E  — »  F.  Let  M  be  a 
compact  topological  space.  Then  the  map  g  :  C(M,  U )  — *  C(M,  F )  defined  by 

g{h;m)  :=  g{h{m)) 

is  also  Cr .  The  derivative  of  g  at  h  G  C(M,U)  is  denoted  Dg(h)  and  is  the  linear 
map  Dg(h)  :  C(M ,  E)  -»•  C(M,  F ) 

Dg(h)[Ahym\  :=  Dg(h(m))[Ah(m)\. 


The  only  modification  needed  to  get  Taylor’s  expansion  is  to  account  for  the  fact 
that  the  domain  of  T  is  Z  x  C.  Let 


diV(zi,z2) 


dr 

dz\ 


(-1,-2), 


d2T{zu  z2) 


dF 

dz2 


(Z!,Z2). 
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Lemma  2  (T)  Let  Z  C  C  be  compact.  Let  U  C  C  be  an  open  subset  containing  Z . 
Let  T  :  U  x  C  — >  R  be  Cr  (r  >  0)  with  first  variation 


DT(z1,z2 )  =  [d1T(z1,z2)  d2T(Zl ,z2)\. 

Then  the  map  f  :  C(Z,  C)  — >  C(Z,  R)  defined  by 

f  (M  :=  r(z,  /i(-z)) 

is  a/so  Cr.  The  derivative  of  f  at  h  G  C(Z,  C)  is  i/ie  linear  map  DT (h)  :  C(Z,  C) 
C(Z,  R) 

L>r(/i)[A/i;z]  :=  2K[a2r(z,/i(z))A/i(z)]. 

The  Taylor  expansion  exists  on  C(Z,  C)  as 

r(z,  /i(z)  +  A/i(z))  =  r(z,  h(z))  +  2^[d2T(z,  h(z))Ah(z)]  +  0[||A/i||^], 
where  0[\\  A/iH^]  does  not  depend  on  z  E  Z . 

Proof:  Let  h  G  C(Z,  C2)  be  written  as 

ufiz)  +  ini(^) 
u2(z)  +  io2(A) 


h(^)  = 


hfiz) 

hfiz) 

and  with  the  corresponding  notation  for  Ah(z).  The  Omega  Lemma  gives  that  T  : 
C(Z,  C2)  — >  C(Z,  R)  defined  by  f(h;z)  :=  T(hfiz),hfiz))  is  Cr  with  derivative 


DT{ h)[Ah;  z]  =  DT(h(z))Ah(z) 


dY 

dxi 


(Hz)) 


or  dr 

%-( Hz ))  %-( Hz )) 

oy i  ox 2 


OV2 


=  2^[d1r(h(z))Ah1(z)}  +  2^[d2r(h(z))Ah2(z)]. 


Aufiz) 
Avfiz) 
A ufiz) 
Avfiz) 


Restrict  T  to  the  affine  space 


M  =  {id}  x  C(Z,  C)  = 
M.  has  tangent  space 

TM  =  {0}  x  C(Z,  C)  =  { 


Hz) 


o 

A  h(z) 


:  h  G  C(Z,  C) 


:  Ah  G  C(Z,  C)  . 
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r  restricted  to  M.  has  derivative 


D(T|M)(h)[Ah;z]  =  DT(  h) 


0 

A  h(z) 


2^[d2T(z,h(z))Ah(z)' . 


Taylor’s  expansion  follows  from  this  first  variation.  Ill 

The  end-of-proof  symbol  is  “///•”  On  occasion,  a  point  x  will  be  ’’added”  to  a 
subset  B  of  a  vector  space: 


x  +  B  =  {x  +  b:be  B}. 

Likewise,  the  sum  of  sets  A  and  B  of  a  vector  space  is  denoted 

d  -f  5  =  {o  T  6  \  a  (r  A,  6  £  B} . 


Table  1:  Summary  of  notation. 


Variable 

Description 

R 

real  numbers 

UN 

real  V-dimensional  space 

R+ 

non-negative  real  numbers 

Rj 

positive  cone  of  Rv 

€ 

complex  numbers 

complex  A-space 

D 

open  unit  disk  in  C 

T 

unit  circle 

C(Z,E ) 

continuous  B- valued  functions  on  the  compact  set  Z 

■pN 

real  polynomials  of  degree  not  exceeding  N 

^t(D) 

disk  algebra 

crit[y(/r)] 

critical  set  of  j(h) 

z 

complex  conjugate  of  z 

III 

end-of-proof  symbol 
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3  The  Kolmogorov  Criterion 


The  Kolmogorov  Criterion  characterizes  optimal  points  of  the  best  approximation 
problem  and  the  minimizers  of  convex  functions.  For  brevity,  the  Kolmogorov  Cri¬ 
terion  is  stated  only  for  the  best  approximation  problem  while  the  text  develops  the 
criterion  for  the  nonlinear  minimization  problems. 

Theorem  1  (Kolmogorov  Criterion)  [7,  pages  6-11].  Let  X  be  a  Bananch  space 
with  dual  space  X* .  Let  K  a  convex  subset  of  X .  The  following  are  equivalent: 

(a)  k0  G  K  is  a  best  approximation  to  x  G  X: 

||a;  —  fcoll  =  inf{||x  —  k\\  :  k  G  K}. 

(b)  There  exists  an  x*  G  X*  that  has  unit  norm 

||x*||  =  1, 


that  supports  the  error  function 


(x*,x  —  ko)  =  ||a;  —  A;0||, 
and  belongs  to  the  negative  cone  of  K 

0  >M[{x*,k)}  (k  G  K). 


For  nonlinear  functions,  the  first  variation  “almost”  convexities  the  problem.  How¬ 
ever,  the  nonlinearity  splits  the  necessary  and  sufficient  conditions  of  the  Kolmogorov 
Criterion.  The  necessary  condition  for  optimization  on  VN  and  .4(D)  is  the  easy  part 
of  the  Kolmogorov  Criterion  [7,  pages  6-11].  Although  the  result  holds  for  arbitrary 
sets  using  tangent  and  contingent  cones,  we  state  it  only  for  subspaces. 

Lemma  3  (Descent)  Let  Z  C  C  be  compact.  Let  H  be  a  closed  linear  subspace  of 
C(Z,  C).  Let  U  be  an  open  subset  containing  Z.  Let  T  :  U  x  C  — »  R  be  C2 .  Define 
7  :  Tt  — >  R  by 

7 (h)  :=  sup{r(z,  h(z))  :  z  G  Z}. 

Assume  Tt  is  boundedly  compact.  If  h  G  hi  is  not  a  local  minimum,  there  exists  a 
nonzero  Ah  G  H  such  that 

0  >  h(z))Ah(z)]  (z  G  crit[7(/i)]). 
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Proof:  If  h  G  7 ~i  not  a  local  minimum,  there  exists  a  nonzero  sequence  {A/rn}  C  7 i. 
converging  to  zero  such  that  q(/r  +  A hn)  <  7 (h).  Set  tn  :=  HA/qJloo  >  0  and 
un  :=  t~f  Ahn.  Compactness  of  hi  implies  that  the  bounded  sequence  {un}  contains 
a  convergent  subsequence.  By  relabeling,  let  un  — >  Ah  G  hi.  Because  un  has  unit 
norm,  Ah  cannot  be  zero.  For  all  z  G  crit[7(/i)],  Lemma  2  provides  the  expansion: 

7( h  +  Ahn )  >  F(z,  h(z)  +  Ahn(z)) 

=  r (z,  h(z))  +  23?  [d2Y  (z,  h(z))Ahn(z)]  +  o[t2n] 

=  7 (h)  +  2K  [d2r (z,  h(z))Ahn(z)}  +  0[t2n). 

Subtract  7 (h)  from  both  sides,  divide  by  tn  >  0  to  get 

0  >  5ft  [d2T(z,  h{z))un{z)]  +  0[tn). 

Letting  n  — >  00  gives  the  result.  /// 


The  Descent  Lemma  (Lemma  3)  has  a  clean  proof  that  reveals  why  boundedly 
compact  supplies  a  “direction  of  descent.”  It  also  supplies  various  points-of-departure 
for  more  sophisticated  results.  For  example,  a  minimization  test  is  obtained,  provided 
the  in  the  “>”  is  handled  with  care. 

Lemma  4  (Minimum  Test)  Let  Z  C  C  be  compact.  Let  hi  be  a  closed  linear 
subspace  ofC(Z,  C).  Let  U  be  an  open  subset  containing  Z.  Let  Y  :  U  x  C  — >  R  be 
C2.  Define  7  :  hi  — >  R  by 


7 (h)  :=  sup{r(z,  h(z))  :  z  G  Z}. 

Let  h  G  hi.  If  there  exists  a  Ah  G  hi  such  that 

0  >  $ft[<92r(z,  h(z))Ah(z)\  (z  G  crit[7(/r)]), 
h  G  hi  cannot  be  a  local  minimum  for  7. 

Proof:  Compactness  of  Z  and  continuity  give  the  existence  of  a  6  >  0  such  that 
3?  [d2T(z,  h)Ah]  <  — <5  <  0  on  crit[7(/i)].  Continuity  gives  an  open  neighborhood  U  of 
crit[7(/i)]  such  that  for  all  z  G  1/  there  holds: 

3?  [c>2r(^,  h)Ah(z)]]  <  -5 12  <  0. 

Then  for  z  G  U  and  for  t  >  0  sufficiently  small  there  holds 

r(z,  h(z)  +  tAh.(z))  =  Y(z ,  h(z))  +  t2iR.[d2Y(z,  h(z))Ah(z)]  +  0[t2] 

<  7 (h)  —  5t  +  0[t2} 

<  l{h). 
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The  first  equality  is  obtained  by  taking  t  >  0  so  small  that  tAh  G  B( 0,  e)  and  applying 
Lemma  2.  The  first  inequality  follows  from  the  5  bound  on  U.  The  last  inequality 
follows  by  taking  t  >  0  small  enough  so  that  the  first-order  term  dominates  the 
second-order  term.  For  z  G  Z  \  U,  continuity  forces  T(z,  h(z))  <  7 (h).  Continuity  of 
T  and  compactness  of  Z  \  U  imply 

T(z,  h(z )  +  tAh(z))  <  7 (h) 

for  t  >  0  sufficiently  small.  Thus,  y(/i  +  tAh)  <  y(/i)  for  all  t  >  0  sufficiently  small. 
Consequently,  h  cannot  be  a  local  minimum  of  7-  III 


The  Minimum  Test  (Lemma  4)  tells  us  that  h  G  hi  cannot  be  a  local  minimum  if 
we  can  find  a  Ah  G  hi  that  “interpolates”  the  first  variation  <92r(/i)  on  the  critical  set 
crit[7(/i)].  Conversely,  if  h  G  hi  is  a  local  minimum,  no  such  interpolator  can  exist. 
That  is,  $t[d2T(z,  h(z))Ah(z)\  must  assume  positive  and  negative  values  on  crit[7(/i)] 
for  any  Ah  G  hi.  Put  another  way,  a  local  minimum  will  force  d2 T(z,  h(z))Ah(z)  to 
wind  around  zero.  Thus,  even  at  this  abstract  level,  the  winding  numbers  appear  in 
the  characterization  of  minima. 

The  Descent  Lemma  (Lemma  3)  uses  a  “<” .  The  Minimum  Test  (Lemma  4)  needs 
a  “<”  .  The  necessary  and  sufficient  conditions  fail  on  the  “=”.  The  bulk  of  our 
efforts  are  devoted  to  bridging  this  gap.  The  basic  idea  is  to  exploit  the  interpolating 
properties  of  the  subspaces.  The  polynomials  are  the  classic  interpolating  space. 


4  Optimization  on  V  x 

It  is  instructive  to  consider  the  minimization  problem  for  the  real  polynomials  VN  in 
C([0, 1],R).  Let  T  :  [0,1]  x  R  — >  R  be  C 2.  The  complex  derivative  is  unnecessary 
but  adapt  the  notation  as  follows: 

<9T 

d2T(x1,x2)  :=  -—(xl,x2). 

OX  2 

The  open  set  condition  in  Lemma  2  becomes  [0, 1]  C  U  C  R  so  that  non-differentiability 
at  0  or  1  is  not  an  issue.  Define  the  mapping  7  :  C([0, 1],  R)  — >  R  by 

7 (h)  :=  sup{r(x,  h(x))  :  x  G  [0, 1]}. 

Suppose  h  G  VN  is  a  local  minimum.  The  Descent  Lemma  gives  that  no  Ah  G  VN 
exists  such  that 

d2T{x,  h{x))Ah{x)  <0  (iG  crit[7(/r)]). 

The  interpolating  properties  of  the  polynomials  force  a  classical  support  and  align¬ 
ment  condition. 


Lemma  5  (Support)  Let  U  be  an  open  subset  containing  [0, 1].  Let  V  :  U  x  R  — >  R 
be  C2 .  Define  the  mapping  7  :  C([0, 1],  R)  — >  R  by 

7 (h)  :=  sup{r(a;,  h(x ))  :  x  G  [0, 1]}. 

Suppose  h  G  VN  is  a  local  minimum  of  7  :  VN  — >  R.  Assume  82V (x,h(x))  ^  0  for 
x  G  crit[7(/z)].  Then  |crit[7(/i)]  |  >  N  +  2 

Proof:  If  crit[7(/i)]  contains  N  +  1  points  or  less,  there  exists  a  Ah  G  VN  such  that 
A h(x)  =  — sign(<92T (x,h(x)))  (. x  G  crit[7(h)]). 

This  forces 

<92T(:r,  h(x))Ah{x)  <0  (x  G  crit[7(h)]). 

By  the  Minimum  Test  (Lemma  4),  h  G  P  v  is  not  a  local  minimum.  This  contradic¬ 
tion  forces  at  least  N  +  2  points  into  crit[7(/i)].  Ilf 


Lemma  6  (Alignment)  Let  U  be  an  open  subset  containing  [0, 1].  Let  T  :  U  x  R  — > 
R  be  C'2.  Define  the  mapping  7  :  C([0, 1],  R)  — ■>  R  by 

7 (h)  :=  sup{T(x,  h(x))  :  x  G  [0, 1]}. 

Suppose  h  G  PA  C  C([0, 1],R)  is  a  local  minimum  of  7  :  VN  — >  R.  Assume 
82 T(x,  h(x))  7^  0  forx  G  crit[7(/r)].  Then  <92T(x,  h(x))  admits  an  alternating  sequence 
of  length  N  +  2  on  crit[7(h)].  That  is,  there  are  at  least  N  +  2  points  xn  G  crit[7(/i)] 

0  <  Xi  <  .  .  .  X2  ■  ■  ■  <  Xn+2  <  1 


such  that 

sign(<92T (xn,h(xn)))  =  -sign(92T (a;n+i,  h(xn+i))). 

Proof:  This  standard  argument  is  from  Cheney  [11].  If  <92T (x,h(x))  alternates 
only  N  +  1  times  on  crit[7(/r)],  then  a  polynomial  Ah  with  N  zeros  placed  at  the 
sign  changes  of  92T(x,  h(x))  and  by  multiplication  by  ±1  will  have  opposite  sign  as 
82V {x,  h(x))  on  crit [7(A)] .  Thus,  82T (x ,  h(x)) Ah(x)  <  0  on  crit[7(h)].  By  the  Mini¬ 
mum  Test  (Lemma  4),  h  G  VN  is  not  a  local  minimum.  /// 

The  satisfying  property  of  Haar  spaces  is  that  this  condition  is  strong  enough 
to  force  a  useful  converse.  The  proof  reveals  how  the  inequality  furnished  by  the 
Descent  Lemma  must  be  folded  into  the  strict  inequality  required  by  the  Minimum 
Test  (Lemma  4). 


9 


Corollary  1  Let  U  be  an  open  subset  containing  [0, 1].  Let  T  :  U  x  R  — »  R  be  C2 . 
Define  the  mapping  7  :  C([0, 1],  R)  — >  R  by 

7 (h)  :  =  sup{r(a:,  h(x))  :  x  G  [0, 1]}. 

Assume  h  G  VN  and  <92 T(x,h(x))  ^  0  for  x  G  crit[7(/i)] .  Then  the  following  are 
equivalent: 

(a)  h  G  VN  is  a  local  minimum. 

(b)  <92r(a;,  h(x))  admits  alternating  sequence  of  length  N  +  2  on  crit[7(/i)]. 

Proof:  We  have  (a)=^(b)  so  we  need  to  prove  (b)=^(a).  Suppose  (b)  holds  but  (a)  is 
not  true.  The  Descent  Lemma  (Lemma  3)  provides  a  nonzero  Ah  G  VN  such  that 

<92r(x,  h(x))Ah(x)  <  0 

on  crit[7(/i)].  Because  (92r (x,h(x))  is  continuous,  does  not  vanish  on  crit[7(/i)],  and 
alternates  +  2  times  on  0  <  x\  <  . . .  xn+2  <  1,  Ah  is  forced  to  have  at  least  one 

zero  in  each  interval  [xn,xn+\\  for  n  —  1, _ yN  +  1.  If  Ah  was  simply  continuous, 

Ah  could  have  as  few  as  floor(iV/2)  +  1  zeros.  This  configuration  happens  when  the 
zeros  in  each  interval  are  common  to  adjacent  end  points.  Figure  1  illustrates  this 
phenomenon  for  N  —  3.  The  alternating  sequence  is  marked  with  the  arrows.  The 
graph  of  Ah  is  schematically  shown  by  the  curved  lines.  The  figure  shows  how  Ah 
can  satisfy  the  inequality  with  only  two  zeros.  However,  Ah  is  a  polynomial  so  the 
zeroes  are  at  least  second  order.  Thus,  the  third-order  polynomial  Ah  has  four  zeros. 
More  generally,  any  equality  in  8 2T(a:n,  h(xn))Ah(xn)  =  0  still  forces  iV  +  1  zeros  into 
Ah  G  VN .  This  contradicts  0  Ah.  Then  (a)  must  be  true.  Ill 


Figure  1:  Minimal  number  of  zeros  of  a  continuous  Ah  for  an  alternating  sequence  of 
length  5. 

What  Corollary  1  demonstrates  is  that  the  Kolmogorov  Criterion  is  a  general 
optimization  technique  that  is  strong  enough  to  specialize  to  specific  problems — 
namely,  nonlinear  polynomial  optimization. 
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4.1  Nonlinear  Approximation  of  exp(— x) 

The  exponential  function  is  a  classical  test  of  approximation  schemes.  This  section 
assesses  one  nonlinear  approximation  scheme  of  the  exponential  function:  find  a  poly¬ 
nomial 

h{x)  =  h0  +  hix  +  h2x 2  +  h^x 3  G  V3 
that  fits  the  exponential  function  as  follows: 

e~x  «  h(x)~l . 

One  approach  chooses  the  performance  function: 

T(a;,  h{x ))  =  (e~x  —  h(x)-1)2. 

The  objective  function  is 

7(A)  :=  sup{T(a:,  h(x))  :  x  G  [0, 1]}. 

The  goal  is  to  minimize  the  worst  fit 

min{7(/i)  :  h  G  hi} 

over  the  subset  hi  C  V3  consisting  of  those  polynomials  that  never  vanish  on  the  unit 
interval.  Although  hi  is  nonlinear,  it  is  an  open  set  of  V3 .  As  an  open  set,  hi  admits 
enough  local  linear  space  structure  to  apply  the  Kolmogorov  Theory.  The  variation 
of  the  performance  function  is 

d2T(x,  h)  =  dh(e~x  -  /K1)2  =  2(e~x  -  /T1)/*"2. 

Corollary  1  applies,  provided  the  gradient  <92 F(x,h)  does  not  vanish  on  the  critical 
set  of  7 (h):  if  x  G  crit[7(/i)], 

d2r(x,  h(x))  ^  0  2(e~x  —  h(x)~1)h(x)~2  ^  0. 

The  error  term  cannot  vanish  because  h{x)  is  not  a  perfect  fit  to  exp(x).  The  ratio¬ 
nal  function  h(x)~2  cannot  vanish  on  the  unit  interval.  Consequently,  no  constraints 
on  h(x)  really  exist,  except  that  h(x)  never  vanishes  on  the  unit  interval.  There¬ 
fore,  Corollary  1  applies  to  characterize  local  minima — the  gradient  <92T(a;,  h )  has  an 
alternating  sequence  of  length  5.  Specifically, 


O<X1<X2<X3<X4<X5<I 


must  exist  in  critjylVi)]  such  that 

sign(<92r(zn,  h(xn)))  =  -sign(d2r(xn+l,h(xn+l))). 
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Because 


sign(<92r(;c,  h(x )))  =  sign(e  x  —  h(x)  x), 

a  local  minimum  is  characterized  whenever  the  error  term  e~x  —  h{x)~l  alternates  in 
sign  on  crit['y(/i)] .  Figure  2  illustrates  such  an  alternating  sequence  close  to  a  local 
minimum.  The  coefficients  of  this  near-local  minimum  are  listed  on  the  right  of  the 
plot: 

hm in(x)  =  0.9998  +  1.0088a;  +  0.4453a;2  +  0.2629a;3. 

The  solid  red  segments  mark  those  x  G  [0, 1]  in  the  95%  neighborhood  of  crit[7(/i)]: 

\e~X  -  hminOr1!  >  0.95  X  7(/ljnin). 


x  10 


-  h  ( x )  1 :  h  e  /  ’’ 

minv  7  min 


Figure  2:  Error  curve  at  near-local  minimum. 
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For  completeness,  slices  of  the  error  surface  at  hm in  are  also  plotted.  The  error 
surface  is  the  graph  of 


h0  +  h\X  +  h2x2  +  h3x3  t— >  7 (h) 

and  needs  five  dimensions  to  plot.  By  varying  only  two  coefficients  of  hmin,  we  can  see 
a  three-dimensional  slice  of  this  error.  Figures  3,  4,  and  5  show  these  slices  of  the  error 
function  around  hm in.  The  plots  reveal  two  general  features.  First,  the  minimum  looks 
unique.  Second,  the  error  surface  has  non-differentiable  “creases”  that  run  through 
the  minimum.  Both  features  have  theoretical  and  numerical  consequences.  Section  9 
discusses  these  consequences  and  opportunities  for  research  with  the  remainder  of  the 
research  topics. 


Figure  3:  I12-I13  slice  of  the  error  surface  at  a  local  minimum. 
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error 


Figure  4:  h\-h^  slice  of  the  error  surface  at  a  local  minimum. 
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error 


Figure  5:  h^-h?,  slice  of  the  error  surface  at  a  local  minimum. 
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4.2  Approximating  Outer  Functions 

Computing  outer  functions  is  a  common  task  in  applied  harmonic  analysis  [5]  and 
signal  processing  [19].  The  problem  is  to  find  a  polynomial  h(z)  that  approximates  a 
positive- valued  function  q(z )  >0  as  follows: 

q(z)  «  exp(3fJ[/i(z)])  (z  G  T). 

Although  the  general  problem  is  on  the  unit  circle,  applications  restrict  q(z)  to  be 

“real”  [17,  Eq.  1.1]:  _ 

q(z)  =  q(z). 

Because  this  real  symmetry  is  inherited  by  best  approximations  [21],  [17],  it  suffices 
to  approximate  using  polynomials  with  real  coefficients.  If  h(z)  has  real  coefficients 
hn ,  we  can  expand  h(z)  as  follows: 

N 

®[h(z)]  =  J2hnW1}  (z  =  eid) 

11=0 

N 

=  y  hn  cos  (nd) 

n= 0 
N 

=  y  hnTn(x)  (x  =  cos (6*)), 

?i=0 

where  Tn  is  the  Chebyshev  polynomial.  Therefore,  the  real  polynomial  approximation 
of  real  functions  on  the  unit  circle  is  equivalent  to  approximation  on  the  real  interval 
[— 1, 1]  by  real  polynomials.  Consider  the  real  outer  function 

g(z)  =  (. z-a )_1 

with  real  pole  a  exterior  to  the  unit  disk.  The  magnitude  of  g(z)  on  the  unit  circle  is 
the  target  function: 

q(z)  =  \g(z)\. 

Although  we  are  starting  with  the  answer,  any  real  0  <  q  G  C(T)  is  approximated  to 
arbitrary  precision  by  exp(9R[/i(z)]),  where  h(z)  is  a  real  polynomial.  Introduce  the 
performance  function 

T(z,  h(z))  =  | q(z)  -  exp(K[/y)])|2 
and  the  objective  function 


7 (h)  =  sup{T(A,  h(z))  :  z  G  T}. 
The  goal  is  to  minimize  the  worst  fit  over  the  polynomials: 

min{7(/i)  :  h  G  VN}. 
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To  apply  Corollary  1,  switch  to  the  real  formalism: 

(  Q{x)  =  q(z), 

r(z,  h(z))  =  (Q{x)  -  exp(ff(a;)))2  <  H(x)  =$l[h(z)], 

[  x  —  9ft  [z]  =  cos(0) 

The  variation  of  the  performance  function  is 

52T(a:,  h)  =  3h{Q{x )  -  exp  (H))2  =  -2  (Q(x)  -  exp  (if))  exp  (if). 

Because  exp(if)  >  0,  the  following  are  equivalent: 

•  <92r(;r,  h(x))  admits  alternating  sequence  of  length  N  +  2  on  crit[7(h)]; 

•  Q{x)  —  exp (H)  admits  alternating  sequence  of  length  N  +  2  on  crit[7(/i)]. 

Figure  6  compares  the  outer  function  and  an  approximation  from  the  polynomials  of 
degree  6. 


|  (z-ay 1 1  -exp(Sr![/7  ■  (r)] ) :  h  -  e/^;  a  1.3 


Figure  6:  Outer  function  approximation. 
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Figure  7  displays  the  error.  The  right  side  of  the  plot  lists  the  coefficients  of  hmin. 
The  solid  red  segments  mark  those  x  G  [—1, 1]  in  the  90%  neighborhood  of  crit[7(h)]: 


ro,  ^min^))  0.90  X  'y(^'min)- 


The  last  two  segments  run  together  on  the  90%  neighborhood.  Close  examination  of 
the  plot  shows  that  the  error  curve  does  alternate  in  sign  eight  times.  Consequently, 
hm in  is  a  nearly  optimum  minimizer. 


|C-<0_1Hexp(^min("))|: 


Figure  7:  Outer  function  error. 
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5  Optimization  on  *4(D) 


Helton  and  Merino  [17]  characterized  disk  algebra  minimizers  in  their  book.  More 
importantly,  their  book  discusses  several  computer  programs  that  compute  these  min¬ 
imizers.  This  section  shows  that  the  Kolmogorov  Criterion  also  characterizes  local 
minimizers  of  the  disk  algebra.  The  point  is  not  to  reinvent  the  results  of  Helton  and 
Merino  but  to  show  that  the  Kolmogorov  Criterion  provide  a  general  framework  for 
optimization  problems  that  also  encompass  the  disk  algebra.  The  disk  algebra 

.4(D)  :=  H°°( D)  n  C( T.  C)  =  1  +  +  z3  +  . . . 

is  essentially  the  space  of  polynomials  on  the  unit  disk.  Analogous  to  the  previous 
results  on  polynomial  optimization,  we  will  see  that  the  support  and  alignment  con¬ 
ditions  readily  follow  from  the  Kolmogorov  Criterion.  Mergelyan’s  Theorem  gives  us 
our  support  condition. 

Theorem  2  (Mergelyan)  [22,  page  423]  If  K  is  a  compact  set  in  C  with  connected 
component,  if  f  G  C(K,  C)  is  analytic  on  the  interior  of  K ,  and  if  e  >  0,  there  exists 
a  polynomial  p(z)  such  that  \\f  —  p\\c(K,<b)  <  e- 

If  the  critical  set  crit[7(/i)]  in  not  the  entire  unit  circle  T.  Mergelyan’s  Theo¬ 
rem  forces  the  existence  of  a  Ah  G  .4(D)  that  matches  the  performance  function’s 
variation: 

A h{z)  =  —d2T(z,h(z))  (z  G  crit[7(h)]). 

If  the  variation  does  not  vanish  on  the  critical  set, 

0  >  ?ft[d-2F{z,h(z))Ah(z)\  (z  G  crit[7(/i)]), 

the  Minimum  Test  (Lemma  4)  states  that  h  G  .4(D)  cannot  be  a  local  minimum. 
Conversely,  if  h  G  .4(D)  is  a  local  minimum,  the  critical  set  is  the  entire  unit  circle. 

Lemma  7  (Support)  Let  U  be  an  open  subset  containing  T.  Let  T  :  U  x  C  — >  It 
be  C2.  Define  7  :  .4(D)  ->  R  by 

7 (h)  :=  sup{r(z,  h(z))  :  z  G  T}. 

Suppose  h  G  .4(D)  is  a  local  minimum  of  7.  Assume  \d2 T(z,  h(z))\  >  0  on  crit[7(/i)]. 
Then  crit[7(h)]  =  T. 

Proof:  If  crit[7(/i)]  is  not  the  entire  unit  circle  T,  the  continuity  of  d2T (z,h(z)) 
permits  an  application  of  Mergelyan’s  Theorem:  there  exists  a  Ah  G  .4(D)  such  that 

A h(z)  =  —d2T (z,h(z))  (z  G  crit[7(h)]). 
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Then  3£<92r£,  h(z))Ah(z)\  <  0  on  crit[7(/i)].  The  inequality  is  strict  because  <92 T(z,  h(z)) 
is  assumed  never  to  vanish.  The  Minimum  Test  (Lemma  4)  asserts  that  h  G  M(D) 
cannot  be  a  minimum  of  7.  This  contradiction  forces  crit[7(/i)]  =  T.  Ill 

Alignment  is  a  little  more  tricky.  The  basic  idea  was  pointed  out  in  Section  3.  A 
local  minimum  will  force  d2T(z,  h(z))Ah(z)  to  wind  around  zero.  For  w  G  C(T.C), 
let  Wind[w£),  T]  algebraically  count  the  number  of  times  w(z)  winds  around  0.  The 
“alternating  condition”  of  the  real  polynomials  turns  into  a  positive  winding  number 
at  a  local  minimum: 

Wind[<92r(z,  h(z)),  T]  >  0. 

The  trick  is  to  link  the  phase  of  <92r (z,h(z))  to  elements  of  M(D).  The  Poisson 
integral  is  the  starting  point.  For  a  complex  Borel  measure  /i  on  T,  the  harmonic 
extension  of  //  is  its  Poisson  integral  [22,  page  252-255]: 

m  w  =  ^  £  » [|n£]  Mt)  (ze  d)  . 

Theorem  3  (Harmonic  Extension)  [22.  page  254]  Let  h  G  C(T.C)  and  define 
H[h ]  on  D  by 

TjruU  ie\  \  h(elB)  r—1 
H^re  )  “  |  P[h}(reie )  r  G  [0,1)  ' 

Then  H[h]  G  (7(D). 

Thus,  functions  in  C(T,  C)  admit  harmonic  extensions  to  D  that  are  continuous 
on  D.  Closely  related  is  the  corresponding  analytic  extension. 

Theorem  4  [22,  page  255]  Suppose  u  is  a  real-valued  function  continuous  on  D  and 
harmonic  on  D .  Then  (on  TD)  u  is  the  Poisson  integral  of  its  restriction  to  T  and 
the  real  part  of  the  analytic  function 

h(z)  =  —  /  e-^u(e«)dt  {z  G  D). 

2n  j  — 7r  en  —  z 

Lemma  8  (Phase  Alignment)  Let  U  be  an  open  subset  containing  T.  Let  T  : 

IJ  x  C  — ►  R  be  C2.  Define  7  :  M(D)  -»■  R  by 

7 (h)  :=  sup{r£,  h(z))  :  z  G  T}. 

Suppose  h  G  A(D)  is  a  local  minimum  of  7.  Assume  |<92r£,  h(z))\  >  0  on  crit[7(/i)]. 
ThenWmd[d2T(z,h(z)).T}>  0. 
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Proof:  Suppose  —k  =  Winder (2,  h(z)),  T]  <  0.  Continuity  gives  that  k  is  finite. 
Thus,  the  differential  error  term  has  phase  like  ~zk.  Set 

v(eie)  =  -Arg[zkd2T{z,h{z))](ei6). 


Then  v  G  U(T).  Use  Theorem  3  to  extend  v  to  a  real  function  continuous  on  D 
and  harmonic  on  D.  Use  Theorem  4  to  extend  v  as  the  imaginary  part  of  a  analytic 
function  g  =  u  +  iv  on  D.  For  0  <  r  <  1,  define  gr(z)  =  g(rz )  on  D.  Then  gr  G  -4(D). 
Its  imaginary  part  vr  converges  to  v  as  r  — »  1.  Set  A hr  =  exp (gr).  Then  A hr  belongs 
to  -4(D)  and  so  does  zkAhr(z).  Then  as  r  — »•  1  there  holds 


3?  \d2r(z,h(z))zkAhr(z)]  =  3?  \\d2r(z,h(z))\e-iv^\Ah(z)\eiVr^ 


>  0. 


Then  the  Minimum  Test  (Lemma  4)  asserts  that  h  cannot  be  a  minimum  of  7.  This 
contradiction  forces  the  winding  number  to  be  strictly  positive.  /// 

The  upshot  of  Lemmas  7  and  8  is  that  if  h  £  -4(D)  is  a  local  minimum: 


•  T(z,  h(z))  is  constant  on  T  or  crit[7(/i)]  =  T. 

•  Wind[<92r(z,  h(z)),  T]  >  0. 


As  with  the  polynomials,  the  disk  algebra  has  enough  structure  to  force  a  converse — 
provided  the  differential  does  not  vanish.  To  see  this,  suppose  crit[7(/i)]  =  T  and 
Wind[<92r(,2,  h(z)),  T]  >  0  but  h  £  -4(D)  is  not  a  local  minimum.  By  the  Descent 
Lemma,  there  exists  a  nonzero  Ah  G  -4(D)  such  that 

0  >  K[d2r(z,  h(z))Ah(z)}  (zgT).  (1) 

Figure  8  illustrates  the  geometry.  For  fixed  z  G  T.  the  complex  vector  d2T(z,  h(z)) 
determines  the  solid  half-plane  consisting  of  all  Ah  G  C  that  satisfy  Equation  (1). 
Figure  8  also  plots  the  conjugate  d2V(z,h(z)).  The  plot  shows  Ah  must  belong  to 
the  negative  cone 

d2V(h)e{el6)  :=  {v  G  R]  :  0  >  d2T(eie,  h(eie))7  v}, 

where  we  switch  to  real  coordinates  in  the  negative  cone.  Referring  again  to  Figure  8, 
we  see  if  d2T(z,  h(z))  winds  positively  around  zero,  the  negative  cone  9r(/r)e(^)  must 
wind  negatively  around  zero.  Because  Ah  belongs  to  this  cone  that  winds  negatively 
around  zero,  Ah  must  also  wind  negatively  around  zero  (provided  it  that  never  van¬ 
ishes  on  T).  But  Ah  belongs  to  -4(D),  so  Ah  must  have  a  non- negative  winding 
number.  These  contradictory  winding  numbers  for  Ah  imply  that  Ah  cannot  exist  as 
a  “direction  of  descent.”  Thus,  the  positive  winding  number  of  the  differential  forces 
h  G  -4(D)  to  be  a  local  minimum.  This  is  the  geometric  idea  of  the  winding  number. 
The  technical  part  is  to  handle  when  Ah  does  have  zeros  on  T.  The  following  result 
summarizes  our  Kolmogorov  approach  that  captures  a  slightly  weaker  result  than 
obtained  by  Helton  and  Merino  in  1998. 
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Figure  8:  Solid  half-plane  marking  Ah  G  C  for  which  K^rA/i]  <  0. 


Corollary  2  [17,  Theorem  9.3.1]  Let  U  be  an  open  subset  containing  T.  Let  T  : 
f/xC^E.  be  C2.  Define  7  :  -4(D)  ->  R  by 

7 (h)  :=  sup{r(z,  h(z ))  :  z  G  T}. 

Assume  \d2T(z,h(z))\  >  0  on  T.  Then  the  following  are  equivalent: 


(a)  h  G  -4(D)  is  a  local  minimum  of  7. 

(b)  r (z,h(z))  is  constant  on  T  and  Wind^r^,  h(z)),  T]  >  0. 


Proof:  Lemmas  7  and  8  give  that  (a)  implies  (b).  For  the  converse,  assume  (b)  holds 
but  that  (a)  is  not  true:  that  h  G  -4(D)  is  not  a  local  minimum.  The  Descent  Lemma 
(Lemma  3)  provides  a  nonzero  Ah  G  -4(D)  such  that  0  >  $t[d2T(z,h{z))Ah{z)\  on 
T.  Let 

k  =  Wmd[d2T(z,h(z)),T]  >  0. 

Then  T  being  C2  with  a  nonzero  variation  permits  us  to  write 


d2T(elfih(elti))  =  \d2T(eiy,h(elti))\elkye 


JkO  iv(ex6) 


where  v(eie)  is  real  and  continuous.  In  terms  of  the  inequality,  there  holds  for  all 
zGT: 


0  >  iR,[\d2T(z,h(z))\eiv{z)Alfiz). 


(2) 
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Use  Theorem  3  to  extend  v  to  a  real  function  continuous  on  D  and  harmonic  on  D. 
Use  Theorem  4  to  extend  v  as  the  imaginary  part  of  a  analytic  function  g  —  u  +  iv 
on  D.  Observe  exp(g)Ah  G  H°°( D)  so  that 

0  =  —  fW  eikde9{eie)Ah(ew) d.0. 

271*  J  7r 

Take  the  real  part  of  both  sides  to  get 

0  =  T  r  meikeeiv{-ei6)Ah(eie)}  e^dQ. 

2l T  Jtt 

Equation  (2)  gives  that  the  “real  part”  of  the  integrand  is  negative  so  that 

0  =  ^[eikdeiv(ei<>)A  h(eie)]  a.e. 

Continuity  implies  that  equality  holds  everywhere.  Ill 


5.1  Hyperbolic  Approximation  to  (z  —  a)  1 


A  canonical  problem  in  H°°  Engineering  is  computing  the  hyperbolic  distance  from 
the  disk  algebra  to  a  given  function  [14].  The  pseudo-hyperbolic  distance1  p  between 
two  elements  g,  h  6  D  is  [27,  page  58]: 


p(g,  h) 


g  -  h 

i  -  gh 


(3) 


Fix  g  G  L00(T)  and  assume  ||g||oo  <  1-  Let  h  vary  over  the  disk  algebra  with  ||/i||oo  < 
1.  The  pseudo-hyperbolic  distance  between  g(z)  and  h(z)  defines  the  performance 
function: 


r  (z,h(z)) 


g(z )  -  h(z) 
1  -  g(z)h(z) 


The  objective  function  is 


7 (h)  :=  sup{T(2:,  h(z))  :  z  G  T}. 
The  minimization  problem  is 


infjq (h)  :  h  G  A(D)}. 


1Also  known  as  the  Poincare  hyperbolic  distance  function  and  is  a  metric  on  D  [26]. 
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Corollary  2  characterizes  local  solutions  by  the  flatness  and  winding  conditions: 


•  r (z,  hmin(z))  is  constant  on  T, 

•  Wind[<92r(z,  h(z)),  T]  >  0, 

provided  g  G  C2.  For  example,  to  make  a  function  not  in  the  disk  algebra,  put  a  pole 
at  0  <  a  <  1  and  set 


The  scaling  puts  g(z)  into  the  unit  ball  of  L°°(T).  Figure  9  shows  the  image  of 
g(z)  in  the  unit  disk.  The  goal  is  to  approximate  g(z)  from  the  disk  algebra  in  the 
pseudo-hyperbolic  metric. 


Figure  9:  g(z)  for  z  G  T. 
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The  best  approximation  is  a  constant: 


/imin  =  0.1883. 


That  is,  Zimin  minimizes  the  pseudo- hyperbolic  distance  g(z)  to  the  disk  algebra. 
Figure  10  plots  the  complex  error  curve: 


Pto(9{z),  Zimin (z)) 


g(z )  -  Zimin(z) 
1  -  g(z)hmin(z ) 


The  figure  shows  that  the  error  is  circular.  Helton  insightfully  saw  that  this  circular¬ 
ity  of  the  error  transplanted  Nehari’s  Theorem  from  the  complex  plane  to  the  disk 
equipped  with  the  hyperbolic  metric.  The  full  power  of  Helton’s  insight  becomes  ap¬ 
parent  when  he  extended  this  result  to  a  nonlinear  performance  function — this  error 
curve  actually  encodes  a  flatness  condition  and  the  winding  number  condition. 


Figure  10:  Complex  error  curve  of  p^(g(z),  Zimin(z))  for  z  =  exp (jd). 
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Figure  11  plots  the  error  curve  at  the  minimizer: 

{z  =  eie). 

Examination  of  the  vertical  axis  shows  that  r(*,  /tmm(^))  is  numerically  flat : 

r(z,  hmin(z))  =  constant. 

1  h/?n-i,nh))  for  -  exp(/0) 

0.3441 
0.3441 
0.3441 
0.3441 
0.3441 
0.3441 
0.3441 
0.3441 
0.3441 
0.3441 

O  50  100  150  200  250  300  350 

9  (deg) 

Figure  11:  Flatness  of  r(z,  hmin(z)). 


r (z,  hmin(z))1/2  :  = 


g{z)  ~  hmin(z) 


1  -  g(z)hmiJz) 
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The  differential  of  the  performance  function  at  the  minimizer  is 


d2r(z, 

Figure  12  plots  this  differential.  The  differential  winds  once  around  zero,  which  is  the 
winding  condition: 

Wind[<92r(z,  hmin(z))]  >  1. 

As  expected,  the  winding  number  of  the  differential  is  positive  at  the  minimizer. 
What  is  unexpected  is  that  the  differential  is  also  flat. 


Winding  Number  of  <gl  0-/?min(-))  for  ^exp(  i©) 


Figure  12:  Differential  d-2T (z ,  hmin(z))  at  the  minimizer. 


5.2  Helton’s  Example 

Helton  and  Merino  [17,  page  142]  offer  a  computer  solution  to  the  minimization 
problem: 

T4  :=  inf(7(/i)  :  h  e  -4(D)} 
on  the  disk  algebra  for  the  performance  function 

r (z,  h{z ))  =  1 0.8  +  (W1  +  h(z))2 12.  (4) 
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The  power  of  Helton  and  Merino’s  solution  overcomes  the  infinite  dimensional  nature 
this  minimization  problem  by  using  Nehari’s  Theorem.  They  estimate  that 

1.0005821  «  lA  =  inf (7(/i)  :  h  G  .4(D)}. 

Our  approach  is  absolutely  pedestrian — simply  approximate  the  disk  algebra  by  the 
polynomials.  This  is  a  typical  engineering  approach  because  the  engineer  typically  has 
only  a  finite  number  of  parameters  to  synthesize  a  solution.  This  engineering  approach 
becomes  less  pedestrian  by  comparing  the  suboptimal  result  against  the  best  bound  of 
Helton  and  Merino  [17].  Benchmarking  engineering  solutions  against  the  best  bound 
is  becoming  common  in  impedance  matching  [13] ,  [24] ,  [4] ,  [3] ;  amplifier  optimization 
[14],  [2];  and  control  problems  [15],  [17]. 

For  example,  Figure  13  plots  the  performance  function  of  Equation  (4)  evaluated 
on  the  minimizer  restricted  to  the  polynomials  of  degree  11.  The  plot  shows  that  this 
minimum  is  relatively  close  to  the  best  bound: 

1.0005821  «  7.4  <  inf{7(/i)  :  h  G  V11}  =  1.0149. 

Whether  this  suboptimal  solution  is  “good  enough”  is  the  decision  that  an  engineer 
must  make. 


ITzl/z  •  (z)):  h  sP 

min'  min 
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Figure  13:  Flatness  of  the  performance  function. 
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For  completeness,  Figure  14  plots  the  variation  of  the  performance  function: 
d2r  {z,  h(z))  =  2(z~1  +  h(z)))(0.8  +  {z~l  +  h(z)f). 


The  winding  number  of  the  variation  is 

Wind[<92r(2,  hmin(^))]  =  1 

so  that  the  alignment  condition  is  satisfied.  The  relative  flatness  and  alignment  of 
hmin  led  us  to  suspect  that  hm in  is  close  to  the  disk  algebra  minimizer 

h, 4  :=  argmin{7(h)  :  h  G  .A(D)}. 


artz ,h  ■  (z)):  h  ■  eP11 
2  min''  ■'  mm 


'Jv  ■y(/?min)=l  .0149;  flat(//niiiiFO.Q5Q285;  Winding  Ninuha — I 
Figure  14:  Winding  Number  of  c^Tf^,  hm\n(z)). 
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Moreover,  the  minimizers  computed  from  increasing  the  degree  of  the  polynomials 
should  be  “converging”  to  the  disk  algebra  minimizer: 

hN  =  argmin{7(/i)  :  h  G  VN}  — >  h^. 

Figure  15  exemplihes  this  belief  by  plotting  the  performance  yf/hnin  n)  as  a  function 
of  the  degree  N  of  the  polynomials.  The  plot  shows  that  near  optimal  performance  is 
achieved  on  the  polynomials  of  degree  N  >  25.  Thus,  knowing  the  best  bound  from 
the  Helton-Merino  computations  provides  the  critical  stopping  point.  Indeed,  the 
minimum  at  IV  =  29  is  starting  to  creep  beneath  the  Helton-Merino  bound.  It  is  not 
that  the  Helton-Merino  bound  is  incorrect — this  creep  is  caused  by  over-interpolating 
on  the  finite  samples  of  the  unit  circle  [24],  Nevertheless,  Figure  15  graphically  raises 
the  fundamental  question: 

Question  1  How  do  the  finite- dimensional  but  realizable  minimizers  Iin  approximate 
the  disk-algebra  minimizer  h ^  ? 


min{#):  h<sJ>'  } 


1.12 


1.08 


1.06 


1.04 


1.02 


5  10  15  20  25 

N:  degree  of  the  polynomials;  min=l.  0005701  for  yv=29 


Figure  15:  Performance  as  a  function  of  the  degree  of  the  polynomials. 
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In  particular,  relating  the  rate  of  converge  of  — >  h. 4  to  the  smoothness  of 

T(z,  h(z ))  offers  a  fascinating  research  opportunity  to  insert  extrapolation  techniques 
into  H°°  Theory.  Likewise,  the  flatness  and  winding  number  conditions  offer  ad¬ 
ditional  measurements  of  the  quality  of  a  suboptimal  solution,  which  raises  classic 
question  regarding  a  suboptimal  solution: 

7(/ia7)  <  1A  +  Ay. 

Assuming  convergence  does  happen,  Question  2  asks: 

Question  2  How  fast  does  hA~f  converge  to  as  Ay  — »  0  ? 

However,  the  far  more  useful  question  is  far  more  difficult,  particularly  when  hAl 
is  known  only  on  a  finite  number  of  points  on  the  unit  circle: 

Question  3  Suppose  {zk}  is  a  dense  sampling  of  the  unit  circle; 

zk  =  ejk/K  (k  =  0,...,K-l), 
where  K  S>  1.  Assume  on  this  sampling  of  the  unit  circle, 

r (zk,  hAl(zk))  <  y u  +  Ay. 


How  far  is  h a7  from  h ^  ? 

These  questions  are  the  standard  ones.  Helton  and  Merino  [17,  page  141]  exploit 
the  flatness  condition  to  measure  the  quality  of  a  suboptimal  solution 

SUp{T(z,  hminQ 

inf{r(z,  hmm{z 

As  the  flatness  tends  to  zero,  the  performance  function  tends  to  a  constant  value.  So, 
the  performance  of  a  suboptimal  solution  and  its  flatness  are  multiple  criteria  for  the 
quality  of  this  numerical  solution. 
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For  example,  Figure  16  displays  the  performance  of  a  suboptimal  solution  from 
the  polynomials  of  degree  11.  The  figure  shows  a  worse  performance  than  reported 
from  Figure  13,  but  better  flatness. 
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O  20  40  60  80  lOO  120  140  160  ISO 

0  (deg):  W9;  y(/>,nm)  1.0247;  nat(/7min)  0.046191 

Figure  16:  Flatness  of  the  performance  function. 
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Table  2  summarizes  the  performance  of  the  suboptimal  solutions  from  the  poly¬ 
nomials  of  degree  11.  The  table  shows  that  one  solution  attains  a  smaller  objective 
but  worse  flatness.  Consequently,  the  engineer  can  trade  off  the  objective  function 
against  the  flatness  function.  The  formal  mathematical  approach  to  these  engineering 
trade-offs  is  multiobjective  optimization. 


Table  2:  Assessing  suboptimal  solutions  from  V11 . 


Table  12.1  [17] 

Figure  13 

Figure  16 

Performance 

1.000582 

1.0149 

1.0274 

Flatness 

0.0020 

0.0503 

0.0462 
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6  Multiobjective  Optimization 


Multiobjective  optimization  is  a  powerful  tool  to  trade  off  competing  objectives.  The 
objective  functions  are  stacked  in  a  vector  and  vector-valued  optimization  is  under¬ 
taken.  The  beauty  of  this  approach  is  that  the  impossible  problems  of  simultaneously 
rationalizing  units  and  adjusting  scaling  factors  is  avoided.  Introduce  the  partial  order 
on  RA  by  declaring 

u  <  v  v  —  u  G  R+ , 

where  R+  denotes  the  closed  positive  orthant 

R+  :=  {y  e  R;¥  :  yn  >  0}. 

Let  7  :  X  C  RM  — >  R  v  denote  the  mapping 

"  7i(x) 

,  \  72  (x) 

7(x)  :=  . 

.  7m  (x)  . 

Each  7m  is  called  an  objective  function  so  that  7  is  called  a  multiobjective  function. 
We  want  to  solve  the  vector-valued  minimization  of  7  on  X.  Boyd  and  Vandenberghe 
[6,  page  20]  have  generalized  the  notion  of  a  “minimizer.”  Denote  the  image  of  X 
under  7  by 

7P0  :=  {7(x)  :  x  G  A"}. 

Any  y(x)  G  7 (A")  is  called  a  minimum  element  of  7 (AT),  provided 

7(x)  <  7(x;) 

for  all  x'  G  X.  A  convenient  notation  for  this  inequality  between  a  point  y(x)  and 
the  set  of  all  the  7(x')’s  is 

7(x)  <  l{X). 

The  key  to  a  minimal  element  is  that  the  inequality  holds  on  all  the  image  7 (A"). 
Equivalently,  this  inequality  states  that  attaching  the  positive  orthant  at  q(x)  will 
subsume  all  the  elements  in  y(X): 

7(^0  —  7(x)  +  R+- 
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Figure  17  illustrates  the  geometry  of  the  minimum  element  in  R2  and  offers  a 
strong  geometric  proof  that  the  minimum  element  is  unique. 


Figure  17:  The  minimum  element. 


Not  all  sets  admit  a  minimum  element.  More  commonly,  we  look  for  minimal 
elements  as  illustrated  in  Figure  18.  Any  7(x)  6  7(A")  is  a  minimal  element  of  7(X), 
provided  [6,  page  21]: 

7(y)  <  7(x)  =►  7(y)  =  7(x). 

Figure  18  shows  this  is  equivalent2  to 

(7(x)  -  R+)  P)  7 (A")  =  {7(x)}. 


Figure  18:  A  minimal  element. 

2More  restrictive  is  the  notion  of  weak  minimizers  [10]:  (7(x)  —  int[R+])  fl  7(X)  =  0. 
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These  definitions  occur  in  the  range  of  7  :  X  C  RjU  — >  RlX .  In  the  domain  of 
7,  any  x  e  X  is  called  Pareto  optimal ,  provided  y(x)  is  a  minimal  element  of  7(A) 
[6,  page  102],  Figure  19  illustrates  all  minimal  elements,  or  the  images  of  the  Pareto 
optima,  as  the  dark  line  on  the  boundary  of  y( X ).  Regardless  of  the  shape  of  7 (A"), 
finding  its  Pareto  set  is  fundamental.  From  Das  and  Dennis  [12]: 

“Very  often  in  engineering  applications,  the  desired  result  helpful  in  facili¬ 
tating  design  is  a  whole  collection  of  Pareto  optimal  points,  representative 
of  the  entire  spectrum  of  efficient  solutions.  Thus,  ideally,  the  desired  so¬ 
lution  is  the  entire  Pareto  optimal  set.” 


In  summary, 


Computing  all  Pareto  optima  is  the  Fundamental  Goal 
of  Multiobjective  Optimization. 


Of  the  many  multiobjective  optimization  schemes,  the  Goal  Attainment  Method 
is  well-suited  for  a  wide  range  of  applications.  Figure  20  illustrates  the  method.  The 
user  specifies  a  vector  of  design  goals  7 u  such  that 

7u  <  l{X) 

and  a  vector  of  non-negative  weights  w.  The  minimizer  attempts  to  shoot  from  7„ 
along  the  direction  of  the  weight  vector  w  and  hit  the  boundary  of  7 (AT).  The  stopping 
point,  if  it  exists,  may  be  a  minimal  element  of  7 (A"). 
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Figure  20:  The  Goal  Attainment  Method. 

The  Goal  Attainment  Method  [8] .  Given  7  :  A"  c 
Rm  — >  Rv.  Select  a  design  goal  7^  <  7(A).  Select  a 
weight  vector  w  G  R+  . 

minimize!  t  ^  R} 

subject  to  x  G  X  and 

7(x)  -  tw  <  7„. 

Das  and  Dennis  [12]  introduce  the  normal-bound  intersection  (NBI)  method  for 
computing  the  Pareto  set  using  the  global  minimizers: 

x„  =  argmin{7n(x)  :  x  G  A}  (n  =  1, . . . ,  A). 

These  minimizers  determine  the  utopic  point 

7i(xi) 

72  (x2) 

7o  :=  ■ 

_  7n(xtv) 

that  is  a  pseudo-minimum  of  7(A).  Figure  21  shows  that  the  utopic  point  is  within 
the  “line  of  sight”  of  the  Pareto  points  by  shooting  along  the  weight  vector  w  >  0. 
The  claim  is  that  by  setting  7^  =  7^  and  varying  the  weight  vector  w  >  0,  a  superset 
of  the  Pareto  set  can  be  computed.  In  practice,  we  can  only  sample  this  superset. 
Das  and  Dennis  [12]  point  out  that  this  sampling  may  not  be  uniformly  distributed. 
Their  key  claim  is  that  the  NBI  method  produces  a  uniform  sampling  of  the  Pareto 
set.  Thus,  practical  questions  can  be  raised  about  the  efficacy  of  the  multiobjective 
minimizers. 
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Figure  21:  The  utopic  point  7<>. 


Figure  18  shows  that  Pareto  optimal  is  a  global  definition.  All  of  the  image 

7 {hi)  =  {t(x)  :  x  G  X} 

must  be  tested.  A  local  notion  of  Pareto  optimal  will  be  needed. 

Definition  1  An  element  hp  G  hi  is  called  locally  Pareto-optimal,  provided  a  neigh¬ 
borhood  U  of  hp  exists  such  that  for  all  h  G  U 

lf(h)  <  lf(hp)  =>  l(h)='y{hp). 

Equivalently,  hp  G  hi  is  not  locally  Pareto-optimal  if  a  sequence  A hk  G  hi  exists 
that  converges  to  zero  and  satisfies 

7m ( h  T  A hif)  A  7 m{hp)  (rn  —  1, . . . ,  Af), 

with  strict  inequality  for  at  least  one  index.  The  image  of  a  Pareto  point  lies  on  the 
boundary  of  7 (hi).  The  image  of  a  local  Pareto  point  may  actually  lie  in  the  interior 
of  7(77). 


7  The  Multiobjective  Kolmogorov  Criterion 

The  Kolmogorov  Criterion  generalizes  to  the  multiobjective  problem.  The  first  result 
is  that  a  direction  of  descent  exists  at  points  that  are  not  locally  Pareto-optimal. 

Lemma  9  (Multiobjective  Descent)  Let  Z  C  C  be  compact.  Let  hi  be  a  closed 
linear  subspace  ofC(Z,  C).  Let  U  be  an  open  subset  containing  Z .  Let  Y  :  U  x  C  — ■> 
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RM  be  C 2.  Define  7  :  H  -»•  RA/  by 


70)  :  = 


7i(^) 

72  (/i) 


7m(/i)  . 


7rn  (/i)  :=  sup{rTO(z,  /i(z))  :26Z}. 


Assume  that  hi  is  boundedly  compact.  If  h  G  77  is  not  locally  Pareto- optimal,  a 
nonzero  Ah  G  hi  exists  such  that 


0  >  K[<92rTO(x,  /i(x))A/i(x)]  (x  G  crit[7 m(/i)]) 


for  m  —  1, . . . ,  M. 

Proof:  If  h  G  77  is  not  locally  Pareto-optimal,  a  sequence  {A/i*,}  G  hi  exists  that 
converges  to  zero  for  which 


7 mih  +  A hk)  <  7 m(h) 

with  strict  inequality  in  at  least  one  index.  Let  tk  HA/z^Hoo  and  set  iq;  :=  t~flAhk- 
By  selecting  a  subsequence,  the  bounded  compactness  of  7 1  asserts  the  existence  of  a 
limit  point:  Uk  —■ ►  A/i  G  H.  By  construction,  A/i  is  nonzero.  For  all  x  G  crit[7m(/i)], 
Lemma  2  provides  the  expansion: 

7m +  Ahk)  >  rm(x,  h(x)  +  Ahk(x)) 

=  rm(x,  h(x))  +  K[<92rm(x,  /i(x))A/ifc(x)]  +  0[t2k] 

=  lm{h)  +  3?[(92rm(x,  h(x))Ahk{x)\  +  0[t2k}. 

Subtract  7 m{h)  from  both  sides,  then  divide  by  tk  >  0  to  get 

0  >  m[d2Tm(x,  h(x))uk(x)]  +  0[tk\. 

Taking  the  limit  as  k  — >  00  gives 

0  >  K[<92rm(x,  h(x))Ah(x)]  (x  G  crit[7 mifi)]) 

for  m  —  1, . . . ,  M.  Ilf 

Roughly  speaking,  this  lemma  provides  a  “candidate”  for  a  direction  of  descent  at 
those  points  not  locally  Pareto-optimal.  The  quotes  are  used  because  the  linearization 
may  have  enough  information  to  dominate  the  function  in  the  neighborhood  of  the 
point.  If  the  derivative  does  not  vanish,  this  problem  is  eliminated  and  the  following 
Multiobjective  Minimization  Test  is  available. 


38 


Lemma  10  (Multiobjective  Minimization  Test)  Let  Z  C  C  be  compact.  Let  Li 
be  a  closed  linear  subspace  of  C(Z,  C).  Let  U  be  an  open  subset  containing  Z.  Let 
r  :  U  x  C  ->  Rm  be  C 2.  Define  7  :  Li  -»•  R  by 


7 (h)  ■= 


li(h) 

12(h) 


lM(h )  _ 


7  m(h)  :=  sup{rTO(z,  /i(z))  :26Z}. 


Let  h  E  Li.  If  there  exists  a  Ah  E  Li  such  that 


0  >  K[(92rm(a:,  h(x))Ah(x)]  (x  E  crit[7 m(/i)]), 
for  m  =  1, . . .  1  M ,  then  h  E  Li  is  not  locally  Pareto-optimum  for  7. 


Examination  of  both  results  reveals  a  new  phenomenon.  For  clarity,  consider  the 
real- valued  case  in  C([0, 1],  R).  Suppose  h  E  Li  admits  a  direction  of  descent  Ah  E  Li. 
The  MultiDescent  Lemma  (Lemma  9)  forces 

0  >  d2Tm(x,  h(x))Ah(x)  x  E  crit[7m(h)] 

for  m  —  1, . . . ,  M.  What  if  two  critical  sets  share  a  common  element?  Specifically, 
suppose  x±  E  crit[7mi(/i)]  D  crit[7m2(/i)]  with  differing  signs: 

0  >  d2rmi(x±,h(x±))  and  0  <  d2Tm2(x±,  h(x±)). 

This  forces  Ah(x±)  =  0.  This  phase  splitting  of  the  differential  requires  some  consid¬ 
eration  and  is  best  approached  through  an  example  on  the  polynomials. 


8  Multiobjective  Optimization  on  VN 

Phase  splitting  forces  additional  constraints  that  depend  on  local  smoothness.  Ex¬ 
amining  a  few  examples  is  worthwhile  before  setting  out  a  general  theory. 

8.1  Approximating  exp(±x) 

The  problem  is  simultaneous  polynomial  approximation:  Find  a  polynomial 

h(x)  =  h0  +  h  1  x  +  h2x2  +  h^x3 

that  fits  the  exponential  function  and  its  reciprocal: 

ex  ~  h(x) 
e~x  «  h(x)^ . 
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Choose  the  performance  functions  as  follows: 


r  i(x,h(x))  =  (ex  —  h(x))2 
r 2(x,h(x))  =  (e~x  -  h(x)"1)2. 


The  objective  functions  are 

7i(h)  :  =  sup{ri(a;,  h(x))  :  x  e  [0, 1]} 

72(h)  :=  sup{T2(x,  h(x ))  :i6  [0, 1]}. 


The  goal  is  to  minimize  the  multiobjective  function 


7(h)  :  = 


7i  {h) 

72(h) 


on  the  nonlinear  subset  7i  C  V 3  consisting  of  those  polynomials  that  never  vanish  on 
the  unit  interval. 

Figure  22  sketches  out  7(77).  The  blue  dots  plot  the  value  of  7 (h)  on  random 
polynomials  h  e  77.  The  red  square  is  the  starting  point  for  the  numerical  minimizing 
method.  The  diamond  marks  where  the  method  terminated. 


II  ^ 

Figure  22:  Random  values  of  7(h). 
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The  minimizer  was  computed  using  the  Goal  Attainment  method  with  equal 
weights  and  goals: 

7 (h)  ~  <  7«;  w  = 

Figure  23  plots  the  error  curves  at  this  numerical  minimizer.  To  understand  this 
plot,  compute  the  partials  of  each  performance  function: 

d2V1(x,h(x))  =  -2(ex  -  h(x)) 
d2r2(x,h(x))  =  +2  (e~x  —  h(x)~1)h(x)~2. 

A  minimizer  of  71(h)  on  7i  C  V 3  is  characterized  whenever  d2Ti(x,h(x))  exhibits 
an  alternation  sequence  of  length  5,  which  forces  the  error  function  —  (ex  —  h(x))  to 
alternate  five  times.  The  upper  panel  of  Figure  23  shows  that  this  alternation — 7i(h0) 
is  at  its  minimal  value.  Because  of  the  unicity  of  best  polynomial  approximations, 
any  nonzero  perturbation  of  degrades  the  performance  71  [11]: 

0  7^  A  he  V3  7i(M  <  71  (/io  +  A  h). 


lu  =  10 


-5 


-4 


x  10 


-4 

x  10 


-He  x  -  1  !h  ■  (pc)') 

v  min v 
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Consequently,  any  nonzero  perturbation  of  h0  that  improves  the  second  objective: 

72  (h<>  +  Ah)  <  72  (h0) 

must  degrade  the  first  objective,  that  is,  7 (h0)  is  a  minimal  element  of  7 (7i)  and  h0  is 
Pareto  optimal.  Likewise,  a  minimizer  of  72(h)  on  hi  C  P3  is  also  characterized  when 
the  error  function  ( e~x  —  h(a;)_1)  alternates  hve  times.  The  lower  panel  of  Figure  23 
shows  that  72  (h0)  has  only  one  critical  point — at  the  endpoint  of  the  unit  interval. 
Although  not  an  example  of  phase  splitting,  the  figure  does  show  that  the  critical 
sets  of  the  individual  objective  functions  can  easily  have  common  elements. 

8.2  Characterization 

For  multiobjective  optimization  on  the  polynomials,  the  alternating  condition  now 
expands  to  include  all  the  critical  sets  of  the  objective  functions  while  phase  splitting 
forces  zeros  into  the  “tangent  space”  of  the  objective  function.  In  the  polynomials, 
the  alternation  and  phase  splitting  balance  out.  Define 

M 

crit  [7(h)]  =  |J  crit  [7™  (h)]. 

771=1 

Let  crit-t[7(/i)]  denote  those  critical  points  for  which  the  differential  phase  splits. 
Formally,  x±  G  crit±[7(h)]  provided  x±  G  crit[7mi(/i)]  D  crit[7m2(/i)]  and 


0  >  d2rmi (x±,  h(x±))d2rm2(x±,  h{x±)). 


Lemma  11  Suppose  T  :  [0,  l]xR->  RM  is  C2 .  Define  7  :  C([0, 1],  R)  — >  RiU  by 


7 (h)  ■= 


7i  (h) 
12(h) 

1m  (h)  _ 


lm(h)  :=  sup{rm(x,  h(x))  :  x  G  [0, 1]}. 


Let  h  G  VN .  Assume  d2Tm(h)  ^  0  on  crit[7m(h)].  On  crit[7m(h)]  \crit±[7(/i)],  define 


s(x) 


1  x  G  crit [lm(h)]  d2Tm(x,  h(x))  >  0 
"1  x  G  crit [lm(h)\  d2T m(x,h(x))  <  0 


If  s(x)  alternates  at  least  N  +  2  —  |crit±[7(/i)]|  times,  then  h  G  VN  is  a  local  Pareto 
point  of'). 
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Proof:  Let  Nc  :=  |crit±[7(h)]|  and  assume  s(x)  alternates  at  least  N  +  2  —  Nc  times. 
Suppose  that  h  G  VN  is  a  not  locally  Pareto-optimal.  Lemma  9  furnishes  Ah  G  VN 
that  is  nonzero  and 


0  >  d2Tm(x,  h(x))Ah(x)  (x  G  crit[7 m{h)]) 

for  rn  —  1, . . . ,  M.  Because  d2Tm(x,  h(x))  does  not  vanish  on  crit[7m(h)],  it  follows 
that  Ah  must  vanish  on  crit±[7(h)].  Factor  A h(x)  as 

A  h(x)  =  p(x)  Ah(x) 

where  p  G  VNc  contains  the  zeros  of  crit-t  [7(/i)]  and  A h(x)  G  /pN+2~Nc  is  zero-free 
on  crit±[7(/i)].  However,  0  >  s(x)Ah(x)  on  crit[7(h)]  \  crit±[7(h)].  The  N  +  2  —  N£ 
alternations  of  s(x)  force  Ah  to  have  at  least  N  +  1  —  Nc  zeros.  Consequently,  Ah 
must  be  zero,  which  forces  Ah  =  0  and  contradicts  that  Ah  is  nonzero.  Thus,  h  G  hi 
must  be  a  local  Pareto  point.  /// 

The  beauty  of  Lemma  11  is  that  all  the  critical  sets  contribute  to  the  alternating 
sequence — decreased  by  the  phase  splitting.  Although  phase  splitting  obscures  the 
converse,  we  have  enough  machinery  to  explore  the  Pareto  sets. 


8.3  The  Pareto  Set  of  exp(±x) 


Section  6  pointed  out  that  the  fundamental  problem  of  multiobjective  optimization 
is  computing  the  Pareto  set  of  7  :  hi  — >  RA .  Recall  that  the  Pareto  set  resides 
in  hi.  Consequently,  the  Pareto  set  depends  on  the  parameterization  of  hi  and  7. 
From  this  computational  point  of  view,  the  Pareto  set  is  difficult  to  visualize  and  to 
use  in  engineering  trade-offs.  In  contrast,  Pareto  image — the  set  of  all  the  Pareto 
points  mapped  by  7  into  Rv — is  far  more  practical  and  computationally  available. 
Typically,  the  range  of  7  has  low  dimension  (A  <  3)  so  that  the  engineer  can  see  the 
performance  and  make  decisions  about  trade-offs. 

How  to  get  the  Pareto  image  when  its  Pareto  set  is  unknown  is  an  excellent 
question.  Because  the  Pareto  image  consists  of  the  minimal  elements  of  7 (hi),  we 
can  “sketch”  7 (hi)  by  randomly  sampling  h  G  hi  and  plotting  the  random  points 
7 (h).  As  the  sampling  gets  denser,  the  image  7 (hi)  starts  to  fill  in  and  the  boundary 
containing  the  minimal  elements  starts  to  appear. 


For  example,  consider  the  objective  function  of  Section  8.1  that  approximates 
the  exponential  function  and  its  reciprocal  by  polynomials  h  G  V3.  The  objective 
function 


7(h) 


li{h) 

72(h) 
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consists  of  the  performance  functions 


ri(x,  h(x)) 
V2(x,  h(x)) 


that  have  variation 


(i ex  —  h(x))2 
\e~x  —  /r(x)-1)2 


d2Ti(x,h(x))  =  -2(ex  -  h(x)) 
d2r2(x,h(x))  =  +2  (e~x  —  h(x)~1)h(x)~2. 


The  domain  H  of  7  is  the  nonvanishing  polynomials  of  V3.  Figure  24  sketches  the 
image  of  7.  The  plot  is  a  closeup  of  Figure  22.  The  blue  dots  in  the  upper  right  are 
7 (h)  for  random  polynomials  in  V3 .  The  red  square  marks  the  starting  point  for  the 
minimizing  method.  The  green  diamonds  are  the  method’s  terminal  points. 


0 


II  *  - 

Figure  24:  Estimating  the  Pareto  points  for  7('P3). 
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The  Goal  Attainment  Method  computed  the  minimizers  using  the  weight  vector 


cos  (9W) 
sin(0w) 


As  9W  sweeps  from  0°  to  90°,  the  Goal  Attainment  Method  sweeps  out  what  are 
numerically  local  Pareto  with  images  marked  by  the  green  diamonds.  This  numerical 
approximation  of  the  Pareto  image  allows  an  engineer  to  see  the  trade-off  between 
the  objective  functions.  Figure  24  also  numbers  selected  points.  The  following  plots 
discuss  the  Pareto  condition  for  each  numbered  point. 

Figure  25  shows  the  error  curves  of  Point  #1.  The  red  segments  mark  the  critical 
set  regions.  The  numbers  on  the  right  are  the  coefficients  of  the  polynomial.  The 
lower  plot  exhibits  an  alternating  sequence  of  length  5.  Lemma  11  observes  that  this 
polynomial  is  indeed  locally  Pareto-optimal. 
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Figure  25:  Pareto  Test  for  Point  #1. 
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Figure  26  shows  the  error  curves  of  Point  #2.  The  lower  plot  now  exhibits  an 
alternating  sequence  of  length  3  while  the  upper  plot  picks  up  the  alternating  sequence 
2 — in  phase  with  the  lower  plot — to  get  a  generalized  alternating  sequence  of  length 
5.  Lemma  11  observes  that  this  polynomial  (coefficients  listed  on  the  right)  is  locally 
Pareto-optimal. 
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Figure  26:  Pareto  Test  for  Point  #2. 
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Figure  27  shows  the  error  curves  of  Point  #3.  Here,  the  alternating  sequence 
splits  between  the  two  error  curves.  This  plot  is  a  splendid  example  of  Lemma  11. 
Because  this  generalized  alternating  sequence  has  length  5,  Lemma  11  verifies  that 
the  polynomial  under  test  is  locally  Pareto-optimal. 
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Figure  27:  Pareto  Test  for  Point  #3. 
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Figure  28  shows  the  error  curves  of  Point  #4  and  shows  that  the  alternating 
sequence  resides  in  the  upper  plot.  Looking  at  all  these  plots  in  sequence,  we  see 
the  alternating  sequence  starting  in  the  lower  plot  (Point  #1),  splitting  between  the 
lower  and  upper  plots  (Points  #2  and  # 3),  and  moving  into  the  upper  plot  (Point 
#4).  Lemma  11  applies  in  all  cases  and  verifies  that  the  polynomials  under  test  are 
locally  Pareto-optimal. 
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Figure  28:  Pareto  Test  for  Point  $4. 


9  The  Kolmogorov  Approach 

The  Kolmogorov  approach  to  optimization  is  a  general  method  that  yields  surprisingly 
concrete  results  when  applied  to  the  objective  function 

7 (h)  =  sup{r(z,  h(z))  :  z  G  Z}. 

Section  4  demonstrated  the  effectiveness  of  the  Kolmogorov  approach  for  charac¬ 
terizing  the  minimizers  of  7 (h)  on  the  polynomials.  The  classical  alternating  con- 


48 


clition  for  polynomial  approximation  is  generalized  to  a  new  alternating  condition 
from  c^T (z,h(z)).  On  the  polynomials,  we  have  essentially  a  finite-dimensional  and 
real-valued  minimization  problem. 

In  contrast,  Section  5  applies  the  Kolmogorov  approach  to  the  disk  algebra — an 
infinite-dimensional  domain  consisting  of  complex-valued  functions.  The  Kolmogorov 
approach  readily  characterizes  the  minimizers  in  the  disk  algebra.  Although  this  result 
belongs  to  Helton  and  Merino  [17],  this  sections  shows  that  the  Kolmogorov  approach 
provides  a  general  method  to  attack  these  minimization  problems.  This  section  also 
showed  how  to  link  the  polynomial  minimizers  to  the  disk  algebra  bounds  obtained 
by  Helton  and  Merino  [17].  This  approach  allows  the  engineer  to  “benchmark”  these 
suboptimal  solutions  against  an  absolute  best  bound.  This  benchmarking  is  a  splendid 
example  of  how  pure  mathematics  can  enhance  traditional  engineering  [2],  [4],  [3]. 
Indeed,  nothing  drives  an  engineer  to  seek  an  optimal  solution  as  striving  against  a 
“best  bound.” 


Not  only  does  the  Kolmogorov  approach  give  the  basic  results  for  these  mini¬ 
mization  problems,  it  generalizes  to  minimization  problems  of  several  objective  func¬ 
tions.  Section  6  lifts  the  single-objective  minimization  problem  to  the  multiobjective 
minimization  problem.  Section  7  develops  the  multiobjective  Kolmogorov  approach 
Section  8  applies  this  approach  to  multiobjective  optimization  on  the  polynomials. 
The  new  alternating  condition  of  the  single-objective  case  is  generalized  to  a  new 
alternating  condition  that  sweeps  over  the  objective  functions.  The  technical  compli¬ 
cation  that  stymies  a  complete  characterization  is  the  possibility  of  “phase  splitting.” 
Nevertheless,  there  exists  enough  theory  to  identify  locally  optimal  minimizers. 

The  Kolmogorov  approach  should  also  apply  to  multiobjective  optimization  on 
the  disk  algebra.  Indeed,  Helton  and  Whittlesey  [18],  Helton  and  Vityaev  [16],  and 
Helton  and  Merino  [17]  generalize  the  single-objective  minimization  problem  to  a 
multiobjective  problem  on  the  disk  algebra. 


We  conclude  by  making  explicit  a  fruitful  point  of  contact  between  the  disk  algebra 
and  the  finite-dimensional  spaces  of  polynomials  in  a  robust  filter-design  problem. 
The  problem  is  to  make  a  trade-off  between  realizing  a  transfer  function  and  its 
associated  sensitivity  to  design  parameters  [25] . 


For  example,  the  transducer  power  gain  Gt  of  a  low-pass  ladder  can  be  parame¬ 
terized  as 


GT(h',ju>) 


1 

1  +  IWI2’ 


where  h(juj)  is  a  real-valued  polynomial  and  u  is  the  radial  frequency.  By  making  a 
bilinear  transform  and  a  slight  abuse  of  notation,  the  problem  can  be  put  on  the  unit 
circle: 


GT{h-z)  :  = 


l  +  |/r(,)|2’ 


(z  =  eje) 
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where  h(z)  is  a  real-valued  polynomial.  The  optimization  of  the  transducer  power 
gain  is  the  problem  of  finding  a  polynomial  h(z)  so  that  Gr(h)  follows  a  user-specified 
design: 

We  specify  a  Gaussian  filter  as  plotted  in  Figure  29.  The  goal  is  to  build  a  low-pass 
ladder  with  a  gain  Gt  that  is  close  to  the  Gaussian  filter.  One  measure  of  “filter 
error”  is 

Ti{z,h{z))  =  (GT(h;z)-GT,u(z))2;  (z  =  eje). 


Gauss  ian  Filter 
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Figure  29:  User-specified  Gaussian  filter. 


Allied  with  the  design  of  the  low-pass  ladder  is  its  sensitivity — the  variation  of 
the  gain  as  a  function  of  its  parameterizing  polynomial: 

G(h  +  Ah)  =  G(h)  +  2^[dhG(h)Ah]  +  .... 

With  the  variation  of  the  gain  given  as 

dhG(h)  =  -G(h)% 

one  measure  of  the  “sensitivity”  of  the  design  is  then 

r  2(ju,h(ju>))  =  \ndhG{h-ju)]\2  =  \K[G(h)2h\2. 
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Consequently,  the  problem  of  minimizing 


7  0) 


7i0) 
72  {h) 


is  the  problem  of  finding  a  filter  of  minimum  sensitivity  that  is  closest  to  the  specified 
design. 

Figure  30  plots  random  samples  of  7 (h)  in  the  Filter-Sensitivity  plane  for  the 
polynomials  of  third  degree  ( h  €  V 3)  as  the  blue  dots.  The  red  square  is  the  starting 
point  for  the  minimizing  method.  The  green  squares  are  the  numerical  minimizers. 


Optimisation  over  P 


Figure  30:  Pareto  surface  in  the  Filter-Sensitivity  plane. 


The  Goal  Attainment  Method  computes  these  minimizers  using  the  weight  vector 


cos(9w) 

sin(0„,) 


As  9W  sweeps  from  0°  to  90°,  the  Goal  Attainment  Method  sweeps  out  numerical 
approximations  to  local  Pareto  points  with  images  marked  by  the  green  diamonds. 
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This  numerical  approximation  of  the  Pareto  image  shows  an  engineer  the  trade-off 
between  the  filter  accuracy  and  sensitivity.  The  curve  generally  shows  that  sensitivity 
increases  as  the  error  decreases.  The  curve  also  hints  at  the  existence  of  fascinating 
fine  structure.  The  single  point  that  has  near  optimal  gain  and  low  sensitivity  cer¬ 
tainly  attracts  the  attention  of  an  engineer  and  brings  the  rest  of  the  “connected” 
Pareto  image  into  question. 

Figure  31  increases  the  degree  of  the  polynomials  from  3  to  6.  The  plot  reveals  that 
the  Pareto  image  does  have  a  fine  structure — a  fine  structure  of  “high-performance” 
points. 


Oplinizalion  oxer  P6 


Figure  31:  Pareto  surface  for  V 6. 


When  both  plots  are  put  in  the  context  of  optimizing  over  a  family  of  polynomials 
VN  for  N  — >  oo,  two  issues  become  apparent.  First  is  the  problem  of  determining  if  a 
given  point  belongs  to  the  Pareto  image.  This  problem  is  specific  to  the  multiobjective 
optimization  for  polynomials  and  the  general  characterization  problem  raised  in  the 
beginning  of  this  report — can  an  answer  be  recognized?  The  second  issue  puts  both 
plots  in  the  context  of  the  best  bounds  that  follow  from  multiobjective  optimization 
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on  the  disk  algebra.  What  would  be  very  helpful  for  the  engineer  is  a  plot  of  the  best 
possible  bounds  attainable  on  the  disk  algebra.  This  “ultimate  Pareto  image”  would 
bound  all  the  polynomial  cases  and  let  the  engineer  trade  off  filter  performance  as  a 
function  of  degree.  Thus,  this  simple  filter  design  problem  is  an  excellent  point-of- 
departure  for  research  in  multiobjective  optimization. 
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